The Unreasonable Effectiveness of Tree-Based Theory for Networks with Clustering 
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We demonstrate that a tree-based theory for various dynamical processes yields extremely accurate 
results for several networks with high levels of clustering. We find that such a theory works well as 
long as the mean intervertex distance I is sufficiently small — i.e., as long as it is close to the value 
of £ in a random network with negligible clustering and the same degree-degree correlations. We 
confirm this hypothesis numerically using real- world networks from various domains and on several 
classes of synthetic clustered networks. We present analytical calculations that further support our 
claim that tree-based theories can be accurate for clustered networks provided that the networks 
are "sufficiently small" worlds. 



PACS numbers: 89.75.Hc, 89.75.Fb, 64.60.aq, 87.23. Ge 

I. INTRODUCTION 

One of the most important areas of network science is 
the study of dynamical processes on networks [H-Q . On 
one hand, research on this topic has provided interest- 
ing theoretical challenges for physicists, mathematicians, 
and computer scientists. On the other hand, there is an 
increasing recognition of the need to improve the under- 
standing of dynamical systems on networks to achieve ad- 
vances in epidemic dynamics 043 ■, traffic flow in both on- 
line and offline systems Q , oscillator synchronization [t| , 
and more [3j. 

Analytical results for complex networks are rather rare, 
especially if one wants to study a dynamical system on 
a network topology that attempts to incorporate even 
minimal features of real-world networks. If one consid- 
ers a dynamical system on a real-world network rather 
than on a grossly simplified caricature of it, then theo- 
retical results become almost barren. Furthermore, most 
analyses assume that the network under study has a lo- 
cally tree-like structure, so that they can only possess 
very few small cycles, whereas most real networks have 
significant clustering (and, in particular, possess numer- 
ous small cycles). This has motivated a wealth of recent 
research concerning analytical results on networks with 
clustering gEMl- 

Most existing theoretical results for (unweighted) net- 
works are derived for an ensemble of networks using (i) 
only their degree distribution pk, which gives the prob- 
ability that a random node has degree k (i.e., has ex- 
actly k neighbors) or using (ii) their degree distribution 
and their degree-degree correlations, which are defined 
by the joint degree distribution P(k, k') describing the 
probability that a random edge joins nodes of degree k 
and k' . In the rest of this paper, we will refer to case (i) 
as "pfc-theory" (the associated random graph ensemble is 



known as the "configuration model" (23|) and to case (ii) 
as "P(/c, fc')-theory" . The clustering in sample networks 
is low in both situations; it typically decreases as N^ 1 as 
the number of nodes N — > oo p\\ . 

We concentrate in this paper on undirected, un- 
weighted real- world networks, which can be described 
completely using adjacency matrices. It is straight- 
forward to calculate the empirical distributions pk and 
P(k,k'), which can then be used as inputs to analyti- 
cal theory for various well-studied processes. The results 
can subsequently be compared with large-scale numerical 
simulations using the original networks. 

In the present paper, we demonstrate that analytical 
results derived using tree-based theory can be applied 
with high accuracy to certain networks despite their high 
levels of clustering. Examples of such networks include 
unive rsity social networks constructed using Facebook 
data [2411 and the Autonomous Systems (AS) Internet 
graph [25]. Specifically, the analytical results for bond 
percolation, fc-core sizes, and other processes accurately 
match simulations on a given (clustered) network pro- 
vided that the mean intervertex distance in the network 
is sufficiently small — i.e., that it is close to its value in a 
randomly rewired version of the graph. Recalling that a 
clustered network with a low mean intervertex distance 
is said to have the small-world property, we find that 
tree-based analytical results are accurate for networks 
that are "sufficiently small" small worlds. In discussing 
this result, we focus considerable attention on quantify- 
ing what it means to be "sufficiently small" . 

The remainder of this paper is organized as follows. In 
SecHH we consider several dynamical processes on highly 
clustered networks and show that tree-based theory ad- 
equately describes them on certain networks but not on 
others. In order to explain our observations, we intro- 
duce in Sec. Mil a measure of prediction quality E and 
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FIG. 1: (Color online) Bond percolation. Plots of GCC size S versus bond occupation probability p for various real- world 
networks. These networks, which we also use as examples in other figures, are (a) the Facebook network for University of 
Oklahoma [2^ 1 , (b ) the Internet at the AS level [25|], (c) the PGP network [2r3-[2^]. and (d) the power grid for the western 
United States (29|,|3(J; 



develop a hypothesis, inspired by the well-known Watts- 
Strogatz example of small-world networks, regarding its 
dependence on the mean intervertex distance £. We pro- 
vide support for our hypothesis by numerical examina- 
tion of a large range of networks in Appendix [B] and by 
analytical calculations in Appendix [X] We discuss our 
conclusions in Sec. IIV1 



II. DYNAMICAL PROCESSES ON NETWORKS 

A. Bond Percolation 

We begin by considering bond percolation, which has 
been studied extensively on networks. In bond percola- 
tion, network edges are deleted (or labeled as unoccupied) 
with probability 1 — p, where p is called the bond occu- 
pation probability. One can measure the effect of such 
deletions on the aggregate graph connectivity in the limit 
of infinitely many nodes using S (p) , the fractional size of 
the giant connected component (GCC) at a given value 
of p. (In this paper, we will use the terminology GCC for 



finite graphs as well.) Bond percolation has been used 
in simple models for epidemiology. In such a context, p 
is related to the average transmissibility of a disease, so 
that the GCC is used to represent the size of an epidemic 
outbreak (and to give the steady-state infected fraction 
in an susceptible-infected- recovered model) [23| . 

Analytical results for GCC sizes for p^-theory [3l[ can 
be found in Eq. (8.11) of Ref. [23| and analytical results 
for P(k, fc')-theory are available in Eq. (12) of Ref. [Hj]. 
We plot these theoretical predictions in Fig. Q] as dashed 
red and solid blue curves, respectively. In this figure, 
we use the following data sets as examples: (a) the 
September 2005 Facebook network for University of Ok- 
lahoma [24| , where nodes are people and links are friend- 
ships; (b) the Internet at the Autonomous Systems (AS) 
level [25| . where nodes represent ASs and links indicate 
the presence of a relationship; (c) the network of users 
of the Pretty- Good-Priv acy (PGP) algorithm for secure 
information interchange [26l - |28| ; and (d) the network rep- 
resenting the topol ogy of the power grid of the western 
United States [29|, [3(| . We treat all data sets as undi- 
rected, unweighted networks. 

We performed numerical calculations of the GCC size 
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FIG. 2: (Color online) Plots of fc-core sizes versus k for the real-world networks from Fig. [T] The highest nonzero fc-cores are 
(a) K Pk = 91, Kp (fc , fc0 = 98, /ST„ um = 107; (b) K Pk = 132, K P(fc , fe >) = 19, if num = 23; (c) iC Pfc = 7, K F(fc>fc0 = 16, K num = 31; 
and (d) K Pk = 6, K P(kik , } = 7, K num = 19. 



using the algorithm in Ref. [33[ and plotted the results as 
black disks in Fig.[TJ It is apparent from Fig.[Ha,b) that 
P(k, fc')-theory matches numerical simulations very accu- 
rately for the AS Internet and Oklahoma Facebook net- 
works, and we found similar accuracy for all 100 single- 
university Facebook data sets available to us. However, 
as shown in Fig. [ljc,d), the match between theory and 
numerics is much poorer on the PGP and Power Grid 
networks. The usual explanation for this lack of accu- 
racy is that it is caused by clustering in the real-world 
network that is not captured by P(k, fc')-theory. Note, 
however, that the Oklahoma Facebook network has one 
of the highest clustering coefficients of the four cases 
in Fig. [T] even though it is accurately described by its 
P(k, k') -theory. 

Indeed, the global clustering coefficients (defined as the 
mean of the local clustering coefficient over all nodes [29j]) 
for the Oklahoma Facebook, AS Internet, PGP, and 
Power Grid networks are 0.23, 0.21, 0.27, and 0.08, re- 
spectively. (See Table U for basic summary statistics for 
these networks.) The clustering coefficients for all 100 
Facebook networks range from 0.19 to 0.41, and the mean 
value of these coefficients is 0.24. These observations sug- 
gest that one ought to consider other explanatory mecha- 



nisms for the discrepancy between theory and simulations 

in Fig. EM). 

In considering other explanations, note that the dis- 
crepancy between theory and numerics in Fig.[]Jc,d) does 
not arise from finite-size effects. To demonstrate this, we 
rewired the networks using an algorithm that preserves 
the P(k, k') distribution but otherwise randomizes con- 
nections between the N nodes [52[ . Because this scheme 
preserves the degree correlation matrix P{k, k'), we call 
this the P -rewiring algorithm. Note that the ensemble 
of fully P-rewired networks is in fact the ensemble of 
random networks defined by the P(k,k') matrix of the 
original (unrewired) network. 

We show numerical calculations of the GCC sizes for 
these rewired networks with blue squares in Fig. [ljc,d) 
and observe that they agree very well with the curves pro- 
duced from P(k, fc')-theory. We conclude that the struc- 
tural characteristics of the original networks — rather 
than simply their sizes — must underlie the observed dif- 
ferences between simulations and analytics. 

Also note that the agreement between P(k, k')- and 
Pfe-theories in Fig. [T] is better in panels (a) and (d) than 
in panels (b) and (c) . This is because the Pearson corre- 
lation coefficient r of the end- vertex degrees of a random 
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FIG. 3: (Color online) Watts' threshold model, with threshold mean fj, and variance a 2 — 0.04 for the networks from Fig. [T] 
We use the seed fraction po — because the nodes with negative thresholds immediately turn on and act as initial seeds. In 
other words, the effective seed fraction is given by the cumulative distribution of thresholds at zero: [l + erf (— jij (<t\/2))] /2. 



edge [23[ has smaller absolute values for the networks 
shown in panels (a) and (d) (0.074, with the mean 0.063 
over 100 Facebook networks, and 0.0035, respectively) 
than it does for the networks in (b) and (c) (—0.2 and 
0.24, respectively). 



(We use K to denote this maximal value of k.) For 
Fig.[2ja) and (b), we obtain Kp[k,k')/ K num « 0.916 and 
Kp{k.k')l ^num ~ 0.826, respectively. The corresponding 



values for Fig. (He) and (d) are K P ( kk r 
and K P ( kyk ,)/K num w 0.368. 



/K n 



0.516 



B. fc-Cores 

Figures [2 [3l and [4] show similar comparisons of analyt- 
ical results versus numerical simulations for other well- 
studied processes on networks. In Fig. [2 we plot the 
A;-core sizes of the networks. The k-coie is the largest 
subgraph whose nodes all have degree at least k. The 
Pfe-theory for fc-core sizes is given in Ref. 3j| and the 
P(k, fc')-theory is given by Eq. (32) of Ref. As 
shown in Fig. [2ja,b), we again find very good agree- 
ment of P(k, fc')-theory with numerical calculations on 
the AS Internet and Facebook networks and less accu- 
rate results for the other example networks. This can 
be quantified by comparing the actual (numerical) result 
for the highest value of k for which the k-core size is 
nonzero to the value that is predicted by P(k, fc')-theory. 



C. Watts' Threshold Model 

Watts [36| introduced a simple model for the spread 
of cultural fads. It allows one to examine how a small 
initial fraction of early adopters can lead to a global cas- 
cade of adoption via a social network. The p^-theory and 
P(k, fc')-theory for the average cascade size are given, re- 
spectively, in Ref. [37} and Ref. [35|. In Fig. [3j we com- 
pare these theories with numerical simulations on popu- 
lations with Gaussian threshold distributions of mean /i 
and variance a 2 = 0.04. The cascade size shows a sharp 
transition as n is increased. As with the other processes 
that we discussed above, the position of this transition is 
accurately captured by the theory for the Facebook and 
AS Internet networks but not for the other examples. 
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FIG. 4: (Color online) SIS dynamics, which we display as plots of infected fraction I(t) versus time t for the networks from 
Fig. [1] The parameters in Eq. (17) of Ref. [(J are the recovery rate and the spreading rate A. We use the value /i = 1 in all 
figure panels; we use 1(0) = 10 -3 in panels (a)-(c) and 7(0) = 0.002 in panel (d); and we use A = 0.02 in panel (a); A = 0.2 in 
(b) and (c); and A = 0.8 in panel (d). 



D. Susceptible-Infected-Susceptible Model 

In Fig. 21 we show a comparison between theory and 
numerical simulation results for the time evolution of a 
susceptible-infected-susceptible (SIS) epidemic model on 
various networks. Unlike the other processes that we 
have discussed, the theory for this case — as given, for ex- 
ample, by Eq. (17) of Ref. [1] — is expected to apply accu- 
rately only to the early-time development of the infection. 
In view of this restriction, the results of Fig. 0] are consis- 
tent with those of Figs. HH21 That is, the P(k, k') -theory 
once again provides accurate results for certain networks 
for a variety of processes of interest but is rather inaccu- 
rate for other networks. 



III. MEASURE OF PREDICTION QUALITY 

We now aim to characterize the types of networks for 
which P(k, fc')-theory can be expected to give good re- 
sults. Because Figs. Q~H3] demonstrate that this charac- 
terization holds for several processes, we will concentrate 
hereafter primarily on the bond percolation case. 



A. Watts-Strogatz Networks 

Using the small-world networks introduced by Watts 
and Strogatz (29|, one can conduct a systematic study 
of the effects of clustering C and the mean intervertex 
distance I. We start with a ring of N = 10000 nodes 
and connect each node to z = 10 nearest neighbors. We 
then randomly rewire a fraction / of the links in the net- 
work [53| . When / = 0, the values of C and £ are both 
high. When / = 1, the rewired network is connected 
completely at random, which gives it low C and I val- 
ues. For each value of / between and 1, we numerically 
calculate the clustering coefficient C 1 /, the mean interver- 
tex distance tf, and the GCC size Sf(p) for all values of 
the bond occupation probability p between and 1. The 
difference between Sf(p) and the P(k, fc')-theory curve, 
which we denote by Sth(p), gives a quantitative measure 
for the inaccuracy of the theory for this particular value 
of the rewiring parameter /. We define the error measure 

M 

£/ = T7El^fe)-S/b 4 )l ' 
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FIG. 5: (Color online) Watts-Strogatz small-world network: If — l\ (red circles), 10 x Cf (open squares), and 100 x Ef (blue 
triangles) as functions of rewiring fraction /. The inset shows if — t\ and C/ as functions of Ej for / > 10~ 2 . Observe the 
linear relation between Ef and if — 1%, which suggests that If —t\ might be a good indicator of how well the bond-percolation 
process on a network can be approximated by tree-based theory. 



where pi = i/M for i = 1, 2, . . . , M are uniformly-spaced 
values in the interval [0, 1]. Taking the spacing 1/M to be 
sufficiently fine (we use 1/M — 10~ 3 ) implies that the er- 
ror measure Ef approaches the average vertical distance 
between the S t h(p) and Sf(p) curves for p G [0, 1]. 

In Fig. [5J we plot the values of if — £i, Cf (scaled by 
a factor of 10 for ease of visualization), and Ef (scaled 
by a factor of 100) as functions of the rewiring parame- 
ter /. For values of / greater than 10 -2 , the quantities 
if and Ef exhibit similar behavior, whereas Cf remains 
near its / = value of 2/3 until / is much larger [54j . 
We highlight the similar scaling of if and Ef in the inset 
of Fig. [51 in which we plot if — l\ directly as a function 
of Ef for / > 10~ 2 . The approximately linear depen- 
dence that we observe contrasts to the clearly nonlinear 
relation between Ef and the clustering Cf that we show 
in the same inset. This strongly suggests that differences 
between theory and numerics are related more directly 
to the mean intervertex distance than to the clustering 
coefficient. 



B. Real- World Networks and Additional Examples 

The above results for Watts-Strogatz small-world net- 
works motivate the examination of a range of real-world 
networks in order to seek a clear relationship between an 
error measure similar to |T|) and some other characteris- 
tic of the network, such as clustering or mean intervertex 
distance. For each network, we calculate the inaccuracy 
of P(k, £/)-theory in terms of the error E, which measures 
the distance between the actual (numerically calculated) 



GCC size curve S num (p) and the theoretical prediction 
Sth(p): 

1 M 

E=—J2\ S tM-Snum(P l )\- (2) 

i=l 

Essentially, E gives the average distance between the 
numerics (black disks) and theory (solid blue curve) in 
Fig. [T] In Fig. HJa), we show a scatter plot of log 10 E ver- 
sus log 10 C, where C is the clustering coefficient of each 
network. We use logarithmic coordinates in Fig. [6] in or- 
der to fully resolve the range of values for both variables, 
as they vary by one or more orders of magnitude. 

We also include synthetic examples, such as Watts- 
Strogatz small- world networks and clustered random net- 
works generated using the recent models described in 
Refs. [12J, [l3j, which we now briefly recall. The fun- 
damental quantity defining the 7-theory networks of 
Ref. [lH is the joint probability distribution j(k,c), 
which gives the probability that a randomly chosen node 
has degree k and is a member of a c-clique (an all-to-all 
connected subgraph of c nodes). With 7(3,3) = 1 (and 
zero for other values of k and c), each node in such a 
network has degree 3 and is part of exactly one triangle. 
This is equivalent to the p\^\ = 1 case in the clustered 
random graph model of Ref. fl2| . where p Si t is the prob- 
ability that a randomly chosen node is part of t different 
triangles and in addition has s single edges (which don't 
belong to the triangles). In each synthetic network, we 
P-rewire a fraction / of links and show our results for 
/ = {10~ 3 ,4 x 10- 3 ,0.04,0.1,0.4}. 

In order to assess the strength of a relation between 
the theory error E and some characteristic of the net- 
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FIG. 6: (Color online) Scatter plots of log 10 E versus (a) log 10 C (with R 2 « 0.087) and (b) log 10 [{£ - h)/z] (with R 2 « 0.94). 



work, we calculate the coefficient of determination i? 2 
using a linear regression. For the data in Fig. EJa), we 
calculate R 2 rj 0.087 (using the points only and ignor- 
ing the connecting curves which help identify families of 
points). This relatively small value indicates that C is 
not a good predictor of the theory error across the set 
of networks that we tested (see Table |TJ . After examin- 
ing a wide range of possibilities (see the scatter plots in 
Appendix [Bj , we found that the network measure that 
best correlates with the error E (on logarithmic scales) 
is [£ — i\)jz (which gives R 2 ~ 0.94), where z is the 
mean degree and £\ is the mean intervertex distance in 
the version of the network that has been fully rewired 
while preserving the joint degree distribution P{k, k') [see 
Fig. EJb)]. Recall that one can think of such fully P- 
rewired versions of a network as random networks with 
the same degree correlation P(k, k') and size as the orig- 
inal network. 

We can summarize our observations as follows. Given 
a network, we compare its mean intervertex distance £ 
with the value £\ in a random network of equal size and 
degree correlation P(k, k'). If the difference £ — £\ is suffi- 
ciently small — e.g., if it is less than z/10, as was the case 
in Fig. [lja,b) — then the P(k, fc')-theory can be expected 
to accurately give the GCC size, fc-core sizes, and results 
for several dynamical processes (see Figs. HHI}. For ex- 
ample, the AS Internet graph has (£ — £i)/z ~ 3.3 x 10~ 2 
and all 100 Facebook networks have values much smaller 
than this. However, the theory is not accurate for larger 
values of I — 1\. (For example, the PGP and Power Grid 
networks have [l—l\)jz values of approximately 0.45 and 
3.9, respectively.) 

Because the tree-based theory systematically gives ac- 
curate results for dynamical processes on networks that 
are not locally tree-like when the intervertex distance is 
small, it seems that there must be a deeper argument 
than is currently known for the validity of such theories. 
We show in Appendix [X] that the error measure E de- 
pends linearly on £ — £\ in a certain class of networks 
with zero clustering. Although this theoretical result is 



restricted in its applicability, it lends weight to our claim 
that E depends primarily on £ — £\ rather than on the 
clustering C . 



IV. CONCLUSIONS 

At the beginning of this paper, we posed the following 
question: "How small must small-world networks be in 
order for P(k, fc')-theory to give accurate results?" Our 
heuristic answer is that they must have a value for the 
mean intervertex distance £ that differs from the mean 
intervertex distance in a random network with the same 
P(fc, k') and number of nodes by no more than about 
10% of the mean degree z. Surprisingly, the level of clus- 
tering has much less of an impact on the accuracy of 
P(k, fc')-theory, which is why we found excellent matches 
between theory and numerical simulations even in highly 
clustered graphs such as Facebook social networks and 
the AS Internet network. 

Although our presentation used bond percolation as 
our primary example, we demonstrated in Figs. [THJ] that 
if P(k, fc')-theory is accurate for percolation, then it also 
works well for other processes. However, an absolute 
measure of accuracy must, of course, depend on the pro- 
cess under scrutiny. For example, Fig. [7] shows a com- 
parison between theory and simulation results for Watts' 
threshold model in which a — 0, which implies that all 
nodes have identical thresholds equal to fi (in contrast 
to Fig. [3]). This example now exhibits different results 
for theory and numerics even in the Facebook networks. 
This suggests that the a = case of Watts' model is 
particularly sensitive to deviations of the network from 
randomness and suggests that this case provides a suit- 
able testing ground for new analytically solvable models 
of networks that include clustering [13, EH ■ 

In summary, we have shown that for a variety of 
processes — including bond percolation and k-core size 
calculations — tree-based analytical theory yields highly 
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FIG. 7: (Color online) Watts' threshold model, with threshold mean fj, and variance a 2 — (i.e., with uniform thresholds) for 
the networks from Fig. [T] We use a seed fraction of pn = 1CP 2 . 



accurate results for networks in which I « l\ even in the 
presence of significant clustering. Such graphs, which in- 
clude the AS Internet network and Facebook social net- 
works, are definitively not locally tree-like, so that the 
theory is working very well even in situations where the 
theory's fundamental hypothesis is known to fail utterly. 
The fact that analytical results for several dynamical pro- 
cesses can be expected to apply on "sufficiently small" 
small- world networks increases the value of existing the- 
oretical work and highlights the types of process for which 
improved analytical modelling of clustering effects should 
most profitably be targeted. We hope that the results of 
the present paper will motivate further research on the 
underlying causes of this "unreasonable" effectiveness of 
tree-based theory for clustered networks. 
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Appendix A: Scaling of Prediction Error with Mean 
Intervertex Distance 

We consider the class of networks for which one can 
define a branching matrix [38j . A branching matrix de- 
scribes the connection probabilities in tree-like networks 
with non- trivial structure, e.g., modular networks [39| . 
In this appendix, we derive how the error measure E de- 
fined in Eq. © depends on I — l\ for a network with a 
branching matrix when the network is close to fully P- 
rewired (i.e., when it is close to a random network with 
the same degree correlation). We give the final formula 
in Eq. (|A6[> below. Because clustering is negligible in 
these infinite networks, E cannot depend on the cluster- 
ing coefficient C. In Fig. |H1 we illustrate both of these 
characteristics for real- world networks. 

The branching matrix characterizes the average inter- 
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vertex distance £ in a network, and it also determines the 
bond percolation behavior. The largest eigenvalue of the 
branching matrix, which we denote by A, determines the 
percolation threshold 



(Al) 



Additionally, an estimate of the mean intervertex dis- 
tance can be written in terms of A as [3g 



IniV 

hiT 



(A2) 



where we recall that N denotes the number of nodes in 
the network. 

We suppose now that the network is almost fully P- 
rewired, and we consider how values of A that differ from 
the fully P-rewired value (which we denote by Ai) affect 
the values of £ and p t h- Note that it is easy to calculate 
Ai, as the branching matrix of a fully P-rewired network 
is given in terms of the degree correlation matrix P(fc, fc') 
by [H 



Pi(fc,fc') = (fc'-l) 



P(fc,fc') 



= (fc'-l 



P{k,k') 
kpk/z 



(A3) 



and Ai is the largest eigenvalue of B%. Moreover, for 
uncorrelated networks produced using the configuration 
model, Ai is simply ^2 k k{k — l)pk/z. This implies in 
particular that Ai = z — 1 for graphs in which all nodes 
have the same degree (such as P-rewired Watts-Strogatz 
networks and the special cases of 7-theory networks used 
in Sec. Ell). 

Considering only small deviations from fully P-rewired 
values, we write A = Ai+AA, and £ = l\+Al. Expanding 
to linear terms, we find from (|A2I) that the excess length 
is 



A£ 



AA In AT 
A^mAi) 2 



(A4) 



Similarly, we find from (|A1[) that the change in percola- 
tion threshold is 



Ap 



Hi 



AA 
A? 



(A5) 



If we now make the further assumption that Ap t h is ap- 
proximately equal to the error E for the bond percolation 
process [this approximation is exact if the effect of the 
perturbation is to shift the entire bond percolation curve 
S(p) to S(p + Apth)], we obtain the relation 



E 



(In AQ 2 
AilnTV 



(A6) 



Although the scope of our analysis is obviously lim- 
ited by our assumptions, Eq. (| A6|) nevertheless supports 



our main claim that E depends primarily on the excess 
length I — l\. Note C — for branching-matrix net- 
works, so E is (trivially) independent of C; compare this 
to the results for the real-world networks that are shown 
in Fig. Uta). Moreover, the scatter plot of log 10 E versus 
log 10 [(lnAi) 2 (^ - £i)/(AilnA0] in Fig. [8] indicates that 
Eq. (|A6|) gives a good fit (R 2 0.87) even for real-world 
networks. 
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FIG. 8: (Color online) Log-log scatter plot of actual (numer- 
ical) values of E for real-world networks versus the values 
predicted by Eq. (1A6[) . for which we numerically calculate £ 
and £1. We find that R 2 ~ 0.87; the slope of the fitted line is 
1.09. 



Appendix B: Scatter Plots 

In this appendix, we show scatter plots of log 10 E ver- 
sus a variety of possible predictors. Recall that E, which 
we defined in Eq. gives an error measure for bond 
percolation. We test for the dependence of E on vari- 
ous combinations of the mean degree z, mean intervertex 
distance £, and clustering coefficients j5j|. Recall again 
that £\ denotes the value taken by £ in a fully P-rewired 
version of a network (i.e., in a random network with the 
same degree correlation and size). 

The scatter plots show data points for real-world net- 
works, and for synthetic Watts-Strogatz small- world net- 
works and 7-theory networks, which are described in 
Scc. lIIIBl The dependence oiE ovl£ — £\ is clearly strong 
(see the top row of scatter plots, which all have R 2 > 0.9), 
whereas the dependence on clustering is weak (see the 
bottom row of scatter plots, which all have R 2 < 0.3). 
Given the relatively small number of available data sets, 
we cannot definitively select the best scaling function 
F(z,£, . . .) for the relation E « F(z,£, . . .){£ - t x ), but 
the simple choice F = 1/z used in Fig. [6th) and the scal- 
ing function F = In 2 Ai/(Ai In A) indicated by Eq. (KE\i 
both give satisfactory fits. 
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Network 


JV 


z 


I 


£1 


t? 


C 


C 


r 


Ref(s). 




Power Grid 


4941 


2.67 


18.99 


8.61 


7.85 


0.08 


0.10 


0.0035 


[29, 30] 




PGP Network 


10680 


4.55 


7.49 


5.40 


2.66 


0.27 


0.38 


0.23 


[26.-28J 




AS Internet 


28311 


4.00 


3.88 


3.67 


2.56 


0.21 


0.0071 


-0.20 


[25] 




RL Internet 


190914 


6.34 


6.98 


5.25 


3.17 


0.16 


0.061 


0.025 


[40] 




Coauthorships 


39577 


8.88 


5.50 


4.45 


2.93 


0.65 


0.25 


0.19 


[41, 42] 




Airports500 


500 


11.92 


2.99 


2.76 


1.62 


0.62 


0.35 


-0.278 


[43,44] 




is 


Interacting Proteins 


4713 


6.30 


4.22 


4.05 


2.96 


0.09 


0.062 


-0.136 


[45-47] 


-a 

<p 


C Elegans Metabolic 


453 


8.94 


2.66 


2.55 


1.93 


0.65 


0.12 


-0.226 


[48, 49] 




C Elegans Neural 


297 


14.46 


2.46 


2.33 


1.84 


0.29 


0.18 


-0.163 


[29, 50] 




Facebook Caltech 


762 


43.70 


2.34 


2.26 


1.55 


0.41 


0.29 


-0.066 


[24] 




Facebook Georgetown 


9388 


90.67 


2.76 


2.55 


1.79 


0.22 


0.15 


0.075 


[24] 




Facebook Oklahoma 


17420 


102.47 


2.77 


2.66 


1.79 


0.23 


0.16 


0.074 


[24] 




Facebook UNC 


18158 


84.46 


2.80 


2.68 


1.87 


0.20 


0.12 


7xl0" 5 


[24] 


u 


7-theory [7(3, 3) = 1] 


1002 


3 


13.15 


8.06 


9.97 


1/3 


1/3 


N/A 


[13] 


*■§ 


7-theory [7(3,3) = 1] 


10002 
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19.81 


11.37 


13.29 


1/3 


1/3 


N/A 


[13] 


Synt: 


Watts-Strogatz (WS) 


1000 


10 


50.45 


3.29 


3.14 


2/3 


2/3 


N/A 


[29] 


Watts-Strogatz (WS) 


10000 


10 


500.45 


4.34 


4.19 


2/3 


2/3 


N/A 


[29] 



TABLE I: Basic summary statistics for the networks that we used in this paper. We have treated all real-world data sets 
as undirected, unweighted networks and have computed the following properties: total number of nodes JV; mean degree z; 
mean intervertex distance £ in original network; mean intervertex distance i\ in the corresponding fully P-rewired version of the 
network (i.e., in a random network with the original degree correlation); the mean intervertex distance if predicted by Eq. (IA2[) 
using the branching matrix corresponding to a random network with the original degree correlation; clustering coefficients C 
and C (whose respective definitions are given by Eqs. (3.6) and (3.4) of 23]); and the Pearson degree correlation coefficient r. 
The last column in the table gives the citation number(s) for the data in the bibliography. 
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